Loading of Data. The data had been downloaded with an API from “Danmarkstatistik” into OpenRefine and cleaned. I will the rest of the modifying here in RStudio.
I load the data from the year 2007, of number of people, aged 18, who have moved to Copenhagen. This is then specifies in which municipality they come from. Then I check the first 6 rows to confirm it looks correct and to see the specifications of the colloum.
data07 <- read_csv("aar2007.csv",show_col_types = FALSE)
head(data07)
## # A tibble: 6 × 4
## TID FRAKOMMUNE ALDER INDHOLD
## <dbl> <chr> <chr> <dbl>
## 1 2007 Koebenhavn 18 år 0
## 2 2007 Frederiksberg 18 år 63
## 3 2007 Dragoer 18 år 10
## 4 2007 Taernby 18 år 41
## 5 2007 Albertslund 18 år 10
## 6 2007 Ballerup 18 år 21
I here want to see how big a percentage of the combined municipality population they people who moves away constitute. I again use an API from Danmarkstatistik to find the total population of 18 year old from each municipality.
aar1808 <- read_csv("antal18.csv", show_col_types = FALSE)
head(aar1808)
## # A tibble: 6 × 2
## OMRÅDE INDHOLD
## <chr> <dbl>
## 1 Koebenhavn 4042
## 2 Frederiksberg 648
## 3 Dragoer 142
## 4 Taernby 471
## 5 Albertslund 435
## 6 Ballerup 541
I merge the two different datasets with the mutate function, as we can see below, the data07 set now has 5 variables
data07 %>%
mutate(Total18=aar1808$INDHOLD) -> data07
head(data07)
## # A tibble: 6 × 5
## TID FRAKOMMUNE ALDER INDHOLD Total18
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 2007 Koebenhavn 18 år 0 4042
## 2 2007 Frederiksberg 18 år 63 648
## 3 2007 Dragoer 18 år 10 142
## 4 2007 Taernby 18 år 41 471
## 5 2007 Albertslund 18 år 10 435
## 6 2007 Ballerup 18 år 21 541
data07 %>%
mutate(procent = (INDHOLD/Total18)*100) -> data07
#The new colloum i add to the existing dataset since i want to keep working with it.
I use the “mutate” function because i wan’t to use data that already have included in my sheet. I use the mutate to make a new colloum that shows the promille of the population that moves away. I expect to find small numbers all over anyway.
Since i use a custom and very specific packages, the minicipalityDK and mapDK, they cannot read my data since the municipality names must match up 100%. I already cleaned up the special charracters in OpenRefine, but now i need to rename the colloums as well, and make them into lowercase.
#for 2007
data07 %>%
mutate(FRAKOMMUNE = tolower(FRAKOMMUNE)) -> data07
data07 %>%
rename(kommune = FRAKOMMUNE) -> data07
I here use mapDK to create a graph that shows the promille who moved. This is visuallied with dark blue as low values, and lighter is higher values.
kommunekort1 <- mapDK(values = 'procent', id = 'kommune', data = data07)
## Warning in mapDK(values = "procent", id = "kommune", data = data07): Some id not
## recognized: taernby
## Warning in mapDK(values = "procent", id = "kommune", data = data07): You
## provided no data for the following ids: taarnby
kommunekort1
Since i don’t have the data for christiansoe and taarnby i will get a warning since they can’t be included then. Also i have no value for Copenhagen, since you can’t move from and too the same.
I find that this isn’t very easy to read, no i will now try with the municipalityDK.
This way it becomes a little easier to read, and then the map is also interactive now, you can click on the municipalities and see the given value.
kommunekort2 <- municipalityDK("procent", "kommune", data = data07, legend=T,pal = "GnBu") %>%
setMapWidgetStyle(list(background= "white"))
## Indlæser krævet pakke: sp
## Missing values for Christiansø
## Missing values for Tårnby
kommunekort2
I wan’t to make the contrast of colours even more clear and the map easier to use and understand.
kommunekort3 <- municipalityDK("procent", "kommune", data = data07, legend=T,pal = colfunc(10)) %>%
setMapWidgetStyle(list(background= "white"))
## Missing values for Christiansø
## Missing values for Tårnby
kommunekort3
The contrast now is from blue to red, and therfor much easier to undersand and read.
I see that Frederiksberg i the municipality with the highest values. Now i want to examine if that has changed over time.
I use the Danmarkstatistik API again and load the data:
fre18 <- read_delim("https://api.statbank.dk/v1/data/FLY66/CSV?delimiter=Semicolon&TILKOMMUNE=101&FRAKOMMUNE=147&ALDER=18&Tid=*",show_col_types = FALSE)
I then want to do a simple plot with ggplot to see if this is normal values or an outlier.
ggplot(fre18) + aes(x = TID, y = INDHOLD, colour = "red") + geom_path()
I can here see that the year 2008, isn’t an outlier and the fact that for Frederiksberg is the most popuplar place to move is Copenhagen, seems very likely.